Genetic difference between two Schistosoma japonicum isolates with contrasting cercarial shedding patterns revealed by whole genome sequencing

Schistosoma japonicum is one of the major infectious agents of human schistosomiasis, mainly endemic in China and the Philippines. We have previously reported the finding of two schistosome isolates, each with a different cercarial emergence pattern adapted to their different hosts. However, there are currently no whole-genome sequencing studies to investigate the underlining genetics of the adaptive traits. We sampled schistosomes in 2013 and 2020 from a hilly area Shitai (ST) and a marshland area Hexian (HX) of Anhui, China. Ten to 15 male or female adult worms from each site/year were sent for whole genome sequencing. Genetics were analyzed, and selection signals along genomes were detected. Gene enrichment analysis was performed for the genome regions under selection. The results revealed considerable genetic differentiation between the two isolates. The genome “windows” affected by natural selection were fewer in ST (64 windows containing 78 genes) than in HX (318 windows containing 276 genes). Twelve significantly enriched genes were identified in ST, but none in HX. These genes were mainly related to specific DNA binding and intercellular signaling transduction. Some functional region changes identified along the genome of the hilly schistosome may be related to its unique late afternoon cercarial emergence.


Introduction
Schistosomiasis is a parasitic disease caused by trematode flukes of the genus Schistosoma and is estimated to affect at least 230 million people worldwide.Schistosoma japonicum (digenean trematode) is mainly endemic in China and partly in the Philippines [19,52].Schistosomiasis in humans in mainland China is caused by S. japonicum only [2].After nearly 70 years of schistosomiasis control in China, great achievements have been obtained, with infections in both humans and livestock reduced to a much lower level across the country [16].
As a consequence, the central government of China has set the goal to eliminate this disease by 2030 [3].However, many challenges remain, one of which may be attributable to its zoonotic transmission, as over 40 species of wild and domestic animals can serve as reservoir hosts for S. japonicum [19].
Schistosoma japonicum has a complex life cycle, involving an obligatory molluscan intermediate host of the genus Oncomelania and a mammalian definitive host.In fresh water, miracidia hatch from eggs, which are produced by sexual reproduction in definitive hosts, and actively penetrate a snail within which asexual reproduction occurs.The infected snail releases free-swimming cercariae that infect humans or reservoir hosts via skin penetration [6].Snail-to-mammal transmission depends on temporal coincidence between the cercariae and the vertebrate because vertebrates are generally in contact with the water for a short length of time, and cercariae have a short lifespan.Cercarial infectivity declines rapidly after a few hours postemergence from a snail, whereas their mortality increases [47].Therefore, the chronobiological characteristics of cercarial emission naturally play a paramount role in schistosome transmission.
Cercarial emergence rhythms are regarded as adaptive activities of parasites to maximize the probability of encountering the right host [7,28,31,41], and moreover, are probably genetically determined [35,42].Several studies have been carried out to show the intraspecific polymorphisms of cercarial emergence patterns [28,33,39,41].For example, on the island of Guadeloupe, S. mansoni cercariae with an early shedding pattern were observed in an urbanized area, where humans were highly infected; whereas cercariae with a late shedding pattern were observed in a sylvatic site, where rats were heavily infected [41].In Anhui (China), S. japonicum cercariae from a hilly region displayed a late afternoon emergence pattern consistent with a nocturnal rodent reservoir, while those from a marshland region showed an early pattern compatible with a diurnal cattle reservoir [28,39].The variations of cercarial emergence between schistosome isolates were considered to be related to different final hosts.A number of studies, based on COX1 or microsatellites [28,32,36,43], have attempted to elucidate any genetic differences between parasites with respect to their biological traits, definitive hosts, or possible habitat types.Evidence from morphological to molecular level indicated that S. japonicum in mainland China may comprise a strain complex, but the low-and middle-valley (including Anhui province) of the Yangtze River were endemic with the same S. japonicum strain [18].Recent advances in whole genome sequencing technology (WGS) make it feasible to study the genetic architecture of schistosomes with different traits [17,21], as genome-wide genetic variation can provide a more comprehensive and unbiased framework to identify genomic regions associated with the phenotypic traits.
We have previously reported that in Shitai (ST), a hilly/mountain area (HM) of Anhui, S. japonicum cercariae had a unique late afternoon emergence pattern, and the definitive host-parasite system was driven by local rodents in spite of high infections in humans [27,28,37].Moreover, our recent research suggested that there was an increased prevalence of S. japonicum infections in rodents in HM regions across China [53].We therefore hypothesized that the specific late-afternoon emergence pattern of S. japonicum was the result of ongoing selection posed by definitive host rodents.As there are no studies to explain the underlining genetics of this kind of adaptive traits with WGS, in this study we sequenced two S. japonicum isolates of Anhui, each with a different cercarial shedding pattern, and then analyzed their genetic relationships with the purpose of detecting any changes in nucleotide diversity and allele frequencies in response to the unique chronobiological characteristics.

Material and sequencing
Schistosoma japonicum were sampled from two sites in Anhui province, China.One site was in Shitai county (ST), a hilly area where S. japonicum was observed with a late afternoon cercarial shedding pattern and rodents were the main reservoir; another was in Hexian (HX), a lake/marshland area (LM) where S. japonicum was observed with a diurnal shedding of cercariae emergence [28] and bovines were the main reservoir.Details on the field samples used in this study are given in Table 1 and Figure 1.Briefly, intermediate host Oncomelania hupensis hupensis snails [9] were captured via routine field surveys in 2013 and 2020, and isolation of infected snails was performed by releasing cercariae in the laboratory.Infected snails were identified in Shitai in 2013 and 2020 and in Hexian in 2013 only.Cercariae released from infected snails of each site/year were separately and subsequently used to establish their corresponding S. japonicum isolates in mice, in which mice were individually exposed to cercariae from a single infected snail.Four weeks later, adult worms were perfused from infected mice and frozen in absolute ethanol, as a future genetic resource.The Ethics Committee of Soochow University reviewed and approved this research protocol (No. 81971957).Ten to 15 male or female adult worms, either batch of which derived from single infected snails, were selected from each site/year (i.e., ST2020m and ST2020f, for males and females from ST in 2020, respectively; ST2013m and ST2013f, for males and females from ST in 2013, respectively; HX2013m and HX2013f, for males and females from HX in 2013, respectively), and were then sent to Novogene (Tianjin, China) for whole genome sequencing.Pooled DNA was extracted using the DNeasy Blood & Tissue Kit (QIAGEN, Hilden, Germany), according to the manufacturer's instructions.Paired-end reads (2 Â 150 bp) for each sample were produced on a NovaSeq 6000 system (Illumina, San Diego, CA, USA).The data that support the findings of this study are openly available in the BioProject database at https://dataview.ncbi.nlm.nih.gov/, reference No. PRJNA793934.

Read mapping and variant calling
Raw reads were filtered with FASTP [4] with the parameters -q 5 -u 50 -n 15, and aligned to the reference genome (ASM636876v1) [29] with BWA-MEM (version 0.7.17) [24] with default parameters.The aligned bam files were then sorted using SAMtools [25], and PCR duplicate reads were marked with Picard-MarkDuplicates (http://broadinstitute.github.io/picard/).Mapping statistics were estimated with SAMtools.The Genome Analysis Toolkit (GATK) [10,30] was used for variant calling.By running HaplotypeCaller, genomic variant call format (GVCF) files were generated for each sample.These GVCF files were next combined into a single GVCF file, which was then used to identify single nucleotide polymorphisms (SNPs) and small indels with GenotypeGVCFs.SNPs and small indels were further filtered using the following criteria: ).SNPs and indels were categorized based on their positions on the genome (including intergenic regions, exons, introns, splicing sites, untranslated regions, transcripts, and 1-kb upstream and downstream regions) and on their effects (i.e., missense, nonsense, and silent mutations).Singleton SNPs in each sample were also calculated with VCFtools.

Genetic differentiation
Principal component analysis (PCA) was performed on the filtered SNP sets with PLINK [38].For each site/year, nucleotide diversity (p) was calculated within a 100-kb window sliding in 10-kb steps, and Tajima's D was calculated within a nonoverlapping 100-kb window with VCFtools [22].Pairwise genetic differentiation (F ST per gene between sampling sites/years) was calculated within a 100-kb window sliding in 10-kb steps using VCFtools [22].

Identification of selected regions in genomes
We conducted tests to detect selection signals along the genomes of ST and HX.The p ratios (p ST /p HX ) and F ST values were Note: HM, Hilly and mountainous areas; LM, Lake and marshland areas.calculated by the sliding-window approach (a 100-kb window sliding in a 10-kb step).To avoid spurious selection signals, windows containing fewer than 10 informative sites from both p and F ST analyses were discarded.We used an empirical procedure [26] and selected the windows simultaneously with significantly low or high p ratios (the 5% left or right tails of the distribution) and significantly high F ST values (the 5% right tail of the distribution).These windows were regarded as the regions with strong selective sweep signals, which should harbor candidate genes that undergo a selective sweep.Candidate genes of selective regions were identified using annotations of the reference [29] with Snpeff (version 5.0e) [5].In order to understand what functions these genes were involved in, we tested for enrichment of gene ontology (GO) terms by comparing to the reference genome with the clusterProfiler package [49].The Benjamini-Hochberg correction was applied and significantly enriched GO terms were identified at a corrected p-value of <0.05.

Sequencing and mapping
Six schistosome samples (i.e., ST2013m, ST2013f, ST2020m, ST2020f, HX2013m, and HX2013f) generated a total of 84.49Gb of raw sequencing data.After filtering sequences of low quality, 0.55 billion reads were left with an average mapping depth of 35Â on the reference genome.The mapping rate was between 95.70% and 96.27%, and the median coverage ranged from 93.58% to 95.18% (Table 2).

Variant identification
Single nucleotide variants varied among samples, mostly between two schistosome isolates.Heterozygous SNPs ranged from the lowest 2,093,493 in ST2013m to the highest 4,335,470 in HX2013m, while homozygous SNPs were in range of the lowest 727,770 in HX2013m to the highest 4,335,470 in ST2013m.The Ts/Tv (transitions/transversions) varied between 1.807 in ST2013f and ST2020m and 1.837 in HX2013f.The average Ts/Tv (transitions/transversions) of variation was 1.82.Singleton SNPs ranged from the lowest 196,368 in ST2013m to the highest 445,959 in HX2013f.Nonsynonymous SNPs varied between 39982 in HX2013m and 81449 in ST2020f (see Table 3).

Genetic differentiation
PCA analysis classified schistosome samples according to their sampling site and year, in which the percentage of the variation in genetic distances explained by each PC is 46.9 and 34.9% for PC1 and PC2, respectively.When samples were grouped based on sampling site and year (i.e., ST2013m and ST2013f into ST2013, ST2020m and ST2020f into ST2020, and HX2013m and HX2013f into HX2013), the pairwise genetic differentiation (F ST ) between HX2013 and ST2013 or ST2020 were 0.55 and 0.54, respectively, showing a great genetic difference; whereas the index between ST2013 and ST2020 was À0.01, indicating almost no difference between both.The values of nucleotide diversity (p) of ST2013 and ST2020 was almost the same, and so for the Tajima's D.Moreover, both p and Tajima's D were much lower when compared to HX2013 (see Table 4 and Fig. 2).We then further grouped samples based on sampling sites, namely ST and HX.The lower nucleotide diversity (p) was in ST (5.61 Â 10 À3 ) and higher in HX (8.48 Â 10 À3 ) (Fig. 3A).Selective neutrality, tested with Tajima's D, showed that ST had more windows observed at extremely high or low values.The overall value of Tajima's D computed along genomes in ST (0.29) was lower than that in HX (0.67) (Fig. 3B).Significant differences for p and Tajima's D value distributions between both sites were observed (p-value < 0.05; two-sample Kolmogorov-Smirnov test).F ST revealed a high degree of genetic differentiation between two sites (mean genome-wide F ST = 0.56) (Fig. 3C).

Genome-wide selective sweep analysis
A sliding-window approach was applied to identify selection signals on genomes.Out of 36,389 windows examined 424 were discarded as they contained fewer than 10 informative sites from both p value and F ST analysis.Strong selective sweep signals were detected in the remaining windows.The genome regions in ST affected by natural selection were fewer than in HX.In ST, strong signals were detected within 64 windows containing 78 genes; whereas in HX within 318 windows containing 276 genes (Fig. 4; Supplemental Tables S1 and S2).ST had lower levels of polymorphism than HX (median p ST /p HX = 0.658), and selection regions in each site showed higher levels of selection statistics (F ST and Tajima's D) (Supplemental Fig. S1).The results suggested that selection may exist in shaping the genome, resulting in changes in phenotypic and/or behavioral traits of schistosomes.However, no significantly enriched genes were identified in the selection regions in HX, but in ST about 12 significantly enriched genes were identified (Supplemental Table S3).The functions of those genes involve positive regulation of cell migration, cis-regulatory region sequence-specific DNA binding, positive regulation of transcription, DNA-templated, positive regulation of epithelial to mesenchymal transition, sequence-specific DNA binding and RNA polymerase II transcription factor binding (Fig. 5).These enriched genes showed a high level of genetic differentiation (average F ST = 0.91) and were mainly related to specific DNA binding and intercellular signaling transduction (Table 5).

Impact of sampling time on selection
To investigate whether the two different years of ST samples affect the above selective analysis, we further performed both FET and CMH tests to examine SNP differences on comparisons of ST2013 vs. HX2013 and ST2013 vs. ST2020 (Supplemental Fig. S2).For HX2013 vs. ST2013, FET yielded roughly 2.36 million SNPs in two merged groups, whereas CMH test contained 0.37 million SNPs throughout the comparisons.A high degree of concordance between FET and CMH (Pearson's correlation; r = 0.98) was identified when comparing the SNPs shared between the two tests (n = 0.36 million).As for ST2013 vs. ST2020, about 1.36 million SNPs were used in FET and 0.92 million in CMH test.Correlation analysis between FET and CMH tests revealed a high level of concordance (n = 0.19 million; Pearson's correlation; r = 0.96).Both FET and CMH tests between HX2013 and ST2013 included higher number of significant SNPs than between ST2013 and ST2020.Moreover, the SNPs of the enriched genes (red dots in Supplemental Fig. S2) showed differences between HX2013 and ST2013, but none between ST2013 and ST2020.Except for the SNPs of five genes (EWB00_003259, EWB00_008464, EWB00_008467, EWB00_010839, and EWB00_010840) which showed significant changes above the genome-wide correction in the FET but not in the CMH test, the SNPs of all previously found selective genes showed significant changes in both FET and CMH tests.The pairwise F ST of these genes showed a high level of difference between ST2013 and HX2013, while none between ST2013 and ST2020 (Supplemental Fig. S3).

Discussion
With the available genome maps of S. haematobium [23], S. mansoni [24], and S. japonicum [25,26], whole genome sequencing was used to reveal the population history of S. mansoni [27], to investigate introgression events of S. bovis and S. haematobium [28], and to evaluate the village-level relatedness and genetic diversity of S. japonicum [29].In this work, using whole-genome pooled sequencing (pool-seq), we sequenced and analyzed genomes of six S. japonicum samples from two sites, each endemic for S. japonicum with a different cercarial emergence pattern.More than 11.87 million SNPs were identified, and two S. japonicum isolates showed great genetic differentiation and contrasting diversities.Although the genome region "windows" affected by natural selection were fewer in ST than in HX, more involved genes were identified in ST.The selective regions and relative genes on the schistosome genome might be related to its unique chronobiological characteristics of cercarial emergence.
In this study, the results from whole genome sequences of S. japonicum showed significant genetic differentiation between two isolates.Rudge et al. [36] once reported strong genetic differentiation in S. japonicum between two schistosome isolates and suggested that this differentiation may be associated with the contrasting host reservoirs.Indeed, our previous work in Anhui identified that rodents may be the key host for maintaining S. japonicum transmission within HM regions, while bovines are key hosts only within LM regions [27,28,37].As cercarial emergence is a heritable trait, shaped by the behavior of definitive hosts [7,41,42], in the view of evolution, S. japonicum cercariae from the HM region with a late afternoon emergence pattern may be the result of adaptation to nocturnal rodent reservoirs, while those from the LM with an early pattern may be the result of adaptation to diurnal cattle reservoirs [28,39].The genetic differentiation observed between the two schistosome isolates in this work may possibly underline their different biological traits, caused by definitive hosts.
Zhao et al. [51] demonstrated that, based on mitochondrial DNA, a much larger degree of genetic variation was observed in low-lying LM regions along the middle and lower reaches of the Yangtze River than in the HM regions in the upper reaches of the River.Similarly, in our work, we observed higher genetic diversity in HX than in ST.The greater genetic diversity and higher positive Tajima's D in HX could result from balancing selection for the parasites, for example at the stage of asexual reproduction within intermediate host snails, as in the marshland annual floods may have brought about the widespread mixing and dispersal of snails across large geographical areas More selective signals were detected on schistosome genomes from HX than from ST, suggesting that HX might receive more selective pressure.This may be explained by the fact that bovines and humans are the key hosts in LM regions [37], and as a major source of infection, have been given high priority for control interventions [45]; whereas rodents are the key hosts in HM regions and are not able to be targeted due to logistic difficulties [23].Significantly enriched genes were identified in selection regions of ST only, suggesting that they might be related to the fact that the schistosomes may have long resided and been transmitted within local rodents.The number of enriched genes were up to 12 and their products are mainly related to transcription and intercellular signal transduction [11,13,15,40,44], such as zinc finger protein [11], transcriptional enhancer factor, mothers against decapentaplegic member 3 isoform 1, and mothers against decapentaplegic member 3 isoform 2 (SMAD) [40,46].The signaling pathways involving SMAD have components sharing high identity with mammalian orthologues.High expression of SMAD may, as identified in the liver of infected mice [50], imply that schistosomes can, in addition to utilizing their own pathways, exploit host growth factors as developmental signals [40].We thus proposed that schistosome from ST may strengthen its transcription activities to increase the opportunity to invade the right host and prepare to exploit key signaling pathways of their host for growth and metabolism.We further found that one of these enriched genes is responsible for the product of histone-lysine N-methyltransferase.This is of interest as histone modifying enzymes have been selected as targets for druggable epigenetic targets in early-stage schistosome drug discovery projects [34].In addition, the involved products also included suppressor of hairless protein (transcription factor partner of Notch receptor), which plays a possible role in Notch signaling in sensory architecture [44].Regarding sensory structures, Hoffmann et al. [20] found that some components of the light detection system, for example S. mansoni rhodopsin (SmRHO), were consistent with the responsiveness of cercariae to light.Their findings were reminiscent of cercarial emergence, a biological trait controlled to some extent by illumination [1,14].More studies would be needed to explore the role of Notch signaling in sensory architecture on S. japonicum in the hilly area.
In conclusion, we found that the hilly/mountainous S. japonicum isolate with late afternoon emergence was genetically different from the LM isolate with a morning shedding    pattern, although both sites are within the same province and with O. h.hupensis as intermediate hosts.Moreover, we identified certain functional region changes along the genome of the hilly isolate that might be related to the unique late shedding pattern of schistosome cercariae.

Figure 1 .
Figure 1.Map of geographical locations of research sites.

Figure 2 .
Figure 2. Box plots of diversity, Tajima's D of S. japonicum and their F ST .(A) Nucleotide diversity estimated in 100-kb windows sliding in 10-kb steps throughout the genome.(B) Tajima's D estimated within a nonoverlapping 100-kb window throughout the genome.(C) pairwise F ST computed in 100-kb windows sliding in 10-kb steps throughout the genome.

Figure 3 .
Figure 3. Box plots of diversity and relationship of two sample groups.(A) Nucleotide diversity estimated in 100-kb windows sliding in 10-kb steps throughout the genome.(B) Tajima's D estimated within a nonoverlapping 100-kb window throughout the genome.(C) F ST between computed in 100-kb windows sliding in 10-kb steps throughout the genome.

Figure 4 .
Figure 4. Distribution of p ratios (p ST /p HX ) and F ST values calculated in 100-kb windows sliding in 10-kb steps throughout the genome.Data points colored red and blue were identified as selected regions in ST (red dots) and in HX (blue dots), respectively.These points correspond to the 5% left and right tails of the empirical p ratio distribution, where the p ratios are 0.092 and 1.643, respectively (vertical dashed lines), and the 5% right tail of the empirical F ST distribution, where F ST is 0.965 (horizontal dashed line).

Figure 5 .
Figure 5. Gene ontology enrichment analysis of genes from genome regions with strong selective signals.

Table 1 .
Geographical information of S. japonicum samples.

Table 3 .
Statistics of SNPs of six samples.

Table 4 .
p and F ST of three groups.

Table 2 .
Summary of mapping statistics.